109 research outputs found
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
Very deep convolutional networks have been central to the largest advances in
image recognition performance in recent years. One example is the Inception
architecture that has been shown to achieve very good performance at relatively
low computational cost. Recently, the introduction of residual connections in
conjunction with a more traditional architecture has yielded state-of-the-art
performance in the 2015 ILSVRC challenge; its performance was similar to the
latest generation Inception-v3 network. This raises the question of whether
there are any benefit in combining the Inception architecture with residual
connections. Here we give clear empirical evidence that training with residual
connections accelerates the training of Inception networks significantly. There
is also some evidence of residual Inception networks outperforming similarly
expensive Inception networks without residual connections by a thin margin. We
also present several new streamlined architectures for both residual and
non-residual Inception networks. These variations improve the single-frame
recognition performance on the ILSVRC 2012 classification task significantly.
We further demonstrate how proper activation scaling stabilizes the training of
very wide residual Inception networks. With an ensemble of three residual and
one Inception-v4, we achieve 3.08 percent top-5 error on the test set of the
ImageNet classification (CLS) challeng
Deep Convolutional Neural Network to Detect J-UNIWARD
This paper presents an empirical study on applying convolutional neural
networks (CNNs) to detecting J-UNIWARD, one of the most secure JPEG
steganographic method. Experiments guiding the architectural design of the CNNs
have been conducted on the JPEG compressed BOSSBase containing 10,000 covers of
size 512x512. Results have verified that both the pooling method and the depth
of the CNNs are critical for performance. Results have also proved that a
20-layer CNN, in general, outperforms the most sophisticated feature-based
methods, but its advantage gradually diminishes on hard-to-detect cases. To
show that the performance generalizes to large-scale databases and to different
cover sizes, one experiment has been conducted on the CLS-LOC dataset of
ImageNet containing more than one million covers cropped to unified size of
256x256. The proposed 20-layer CNN has cut the error achieved by a CNN recently
proposed for large-scale JPEG steganalysis by 35%. Source code is available via
GitHub: https://github.com/GuanshuoXu/deep_cnn_jpeg_steganalysisComment: Accepted by IH&MMSec 2017. This is a personal cop
Rethinking the Inception Architecture for Computer Vision
Convolutional networks are at the core of most state-of-the-art computer
vision solutions for a wide variety of tasks. Since 2014 very deep
convolutional networks started to become mainstream, yielding substantial gains
in various benchmarks. Although increased model size and computational cost
tend to translate to immediate quality gains for most tasks (as long as enough
labeled data is provided for training), computational efficiency and low
parameter count are still enabling factors for various use cases such as mobile
vision and big-data scenarios. Here we explore ways to scale up networks in
ways that aim at utilizing the added computation as efficiently as possible by
suitably factorized convolutions and aggressive regularization. We benchmark
our methods on the ILSVRC 2012 classification challenge validation set
demonstrate substantial gains over the state of the art: 21.2% top-1 and 5.6%
top-5 error for single frame evaluation using a network with a computational
cost of 5 billion multiply-adds per inference and with using less than 25
million parameters. With an ensemble of 4 models and multi-crop evaluation, we
report 3.5% top-5 error on the validation set (3.6% error on the test set) and
17.3% top-1 error on the validation set
Deep Convolutional Ranking for Multilabel Image Annotation
Multilabel image annotation is one of the most important challenges in
computer vision with many real-world applications. While existing work usually
use conventional visual features for multilabel annotation, features based on
Deep Neural Networks have shown potential to significantly boost performance.
In this work, we propose to leverage the advantage of such features and analyze
key components that lead to better performances. Specifically, we show that a
significant performance gain could be obtained by combining convolutional
architectures with approximate top- ranking objectives, as thye naturally
fit the multilabel tagging problem. Our experiments on the NUS-WIDE dataset
outperforms the conventional visual features by about 10%, obtaining the best
reported performance in the literature
Norm-in-Norm Loss with Faster Convergence and Better Performance for Image Quality Assessment
Currently, most image quality assessment (IQA) models are supervised by the
MAE or MSE loss with empirically slow convergence. It is well-known that
normalization can facilitate fast convergence. Therefore, we explore
normalization in the design of loss functions for IQA. Specifically, we first
normalize the predicted quality scores and the corresponding subjective quality
scores. Then, the loss is defined based on the norm of the differences between
these normalized values. The resulting "Norm-in-Norm'' loss encourages the IQA
model to make linear predictions with respect to subjective quality scores.
After training, the least squares regression is applied to determine the linear
mapping from the predicted quality to the subjective quality. It is shown that
the new loss is closely connected with two common IQA performance criteria
(PLCC and RMSE). Through theoretical analysis, it is proved that the embedded
normalization makes the gradients of the loss function more stable and more
predictable, which is conducive to the faster convergence of the IQA model.
Furthermore, to experimentally verify the effectiveness of the proposed loss,
it is applied to solve a challenging problem: quality assessment of in-the-wild
images. Experiments on two relevant datasets (KonIQ-10k and CLIVE) show that,
compared to MAE or MSE loss, the new loss enables the IQA model to converge
about 10 times faster and the final model achieves better performance. The
proposed model also achieves state-of-the-art prediction performance on this
challenging problem. For reproducible scientific research, our code is publicly
available at https://github.com/lidq92/LinearityIQA.Comment: Accepted by ACM MM 2020, + supplemental material
Fake news identification on Twitter with hybrid CNN and RNN models
The problem associated with the propagation of fake news continues to grow at an alarming scale. This trend has generated much interest from politics to academia and industry alike. We propose a framework that detects and classifies fake news messages from Twitter posts using hybrid of convolutional neural networks and long-short term recurrent neural network models. The proposed work using this deep learning approach achieves 82% accuracy. Our approach intuitively identifies relevant features associated with fake news stories without previous knowledge of the domain
Learning to Find Eye Region Landmarks for Remote Gaze Estimation in Unconstrained Settings
Conventional feature-based and model-based gaze estimation methods have
proven to perform well in settings with controlled illumination and specialized
cameras. In unconstrained real-world settings, however, such methods are
surpassed by recent appearance-based methods due to difficulties in modeling
factors such as illumination changes and other visual artifacts. We present a
novel learning-based method for eye region landmark localization that enables
conventional methods to be competitive to latest appearance-based methods.
Despite having been trained exclusively on synthetic data, our method exceeds
the state of the art for iris localization and eye shape registration on
real-world imagery. We then use the detected landmarks as input to iterative
model-fitting and lightweight learning-based gaze estimation methods. Our
approach outperforms existing model-fitting and appearance-based methods in the
context of person-independent and personalized gaze estimation
- …